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Abstract 

This paper studies Galerkin approximations applied to the Zakai equation of stochastic 
filtering. The basic idea of this approach is to project the infinite-dimensional Zakai equation 
onto some finite-dimensional subspace generated by smooth basis functions; this leads to a finite- 
d dimensional system of stochastic differential equations that can be solved numerically. The 

^> contribution of the paper is twofold. On the theoretical side, existing convergence results are 

extended to filtering models with observations of point-process or mixed type. On the applied 
side, various issues related to the numerical implementation of the method are considered; in 
I— I particular, we propose to work with a subspace that is constructed from a basis of Hermite 

polynomials. The paper closes with a numerical case study. 

(— I Keywords Stochastic filtering, Zakai equation, point processes, Galerkin approximation, 

Hermite polynomials. 

Lli AMS classification 60G35, 60H15, 65C30, 92E11 

1 Introduction 

in 

Stochastic filtering deals with the recursive estimation of the conditional distribution of a signal 
process X given some form of noisy observation of X. In the standard continuous time filtering 
^y-^ models this noisy observation is generated by a process Z with dynamics of the form 

o , 

^ Zt = Zo+ h{Xs)ds + Wt (1) 



X 



Jo 

for some Brownian motion W that is independent of X. In that case TTt{dx), the conditional 
distribution of Xt given a{Zg: s < t), can be characterized by a measure- valued stochastic 
partial differential equation (SPDE) known as Zakai equation. This SPDE is in general an 
infinite-dimensional equation that cannot be solved directly. In view of the practical relevance 
of filtering, a wide range of methods for the approximation of this equation by finite-dimensional 
systems and for the numerical solution of filtering problems in general has therefore been de- 



veloped; a good survey is given in Budhiraja, Chen, and Lee (2007) or in Bain and Crisan 



1(2009) Popular numerical methods for filtering problems include the extended Kalman filter 
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(Jazwinski (1970)1; quantization (Gobet, Pages, Pham, and Printems (2006)); Markov-chain 
appr oximation ([Dupuis and Kushner (2001) Di Masi and Runggaldier (1982) ); spectral meth- 



ods ( Lototsky (2006) I and simulation methods such as particle filtering ( Crisan, Moral, and 



Lyons (1999)). 



If the signal X is a diffusion process with uniformly parabolic generator the conditional 
distribution 7Tt{dx) admits a Lebesgue density that solves a SPDE in a suitable function space. 



the so-called Zakai equation for the unnormalized conditional density; see for instance Pardoux 
|(1979b) Galerkin approximations are an important numerical technique for dealing with this 
SPDE. The basic idea of this approach is to project the Zakai equation for the conditional 
density onto some finite-dimensional subspa-ce H^i generated by basis functions 61, ... , Cn- This 
leads to an n-dimcnsional SDE system for the Fourier coefficients of the solution of the projected 
equation; this SDE system can then be solved by numerical methods for "ordinary" SDEs. 

Theoretical and numerical aspects of Galerkin approximations are well understood for the 
case of pure diffusion observation as in Q; see for instance Germani and Piccioni (1984) and 



Germani and Piccioni (1987) for convergence results for Galerkin approximations and Ahmed 



and Radaideh (1997) or Ahmed (1998)| for a case study and a discussion of numerical aspects. 



Much less is known for the case of mixed observations of diffusion and point-process type. 
In this paper we therefore consider a model where a doubly stochastic point process N with 
intensity X{Xt) is observable in addition to the process Z. Models of this type arise naturally 
in credit risk modelling (see Example 2.1 below) or in the modelling of high frequency data in 



finance ( Frey and Runggaldier (2001 j[ Cvitanic, Liptser, and Rozovski (2006)). Outside the 
field of financial mathematics point-process information plays among others a crucial role in 
the analysis of queueing systems (Bremaud (1981) ). 

Our contribution is twofold. On the theoretical side we generalize the convergence results 
of Germani and Piccioni (1987) to the case of mixed observations |^ On the applied side we 
extend the numerical analysis of [Ahmed and Radaideh (1997) in various ways: to begin with, 
we propose to use Hermite polynomials as basis functions (instead of Gaussian basis functions); 
we explain how to change the basis adaptively in order to deal with sudden shifts in location 
and scale of the conditional density caused for instance by jumps in the observation, and we 
compare several methods for the numerical implementation of the SDE-system that results 
from the Galerkin approximation. An extensive simulation study shows that the Galerkin 
approximation works well for systems with mixed observation provided that the necessary care 
is taken in the implementation of the method. 

The paper is organized as follows. The model and the various versions of the Zakai equa- 
tion are described in Section [21 In that section we moreover introduce the basic form of the 
Galerkin approximation. Convergence results for the Galerkin approximation are given in Sec- 
tion [3l Section [4l deals with the numerical implementation of the model; results from numerical 
experiments are finally reported in Section [5] 



2 Zakai equation and Galerkin approximation 

In this section we introduce the nonlinear filtering problem studied in this paper. Moreover, we 
present different versions of the Zakai equation that describe the solution of the filtering problem. 
Finally we introduce the Galerkin approximation for the Zakai equation for the unnormalized 
conditional density and we derive an SDE system for the Fourier coefficients. 



4t 



In this context we mention the recent work Hausenblas (2008) where theoretical properties of a finite element 



approximation of certain SPDEs driven by a Poisson random measure of pure jump type are studied. 
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2.1 Model and notation 



We consider a filtered probability space (fi, T ^ F, P) where the filtration F = {J-'t)o<t<T satisfies 
the usual conditions and where T is an arbitrary but fixed horizon date. The nonlinear filtering 
problem we study consists of an unobserved state process X and observations Z and A^. Z is a 
nonlinear continuous transformation of X with additional Gaussian noise, while is a doubly 
stochastic Poisson process whose intensity is a nonlinear function of X. 

The state process. We consider an unobserved state process X on R'^ which is the solution 
of the SDE 

Xt=Xo+ [ b{X,)ds+ I a{Xs)dV„ 0<t<T, (2) 
Jo Jo 

for a m-dimensional F-Brownian motion V. Moreover, we assume that Xq has finite second 
moments and a density po G L'^(M.'^). Set a{x) = (j(,t)o"(.t)^. The components of a{x) and b{x) 
are denoted by aij{x) and bi{x), respectively. The restriction of the generator ^ of the Markov 
process X to C^{R'^), the set of all bounded and twice continuously differentiable functions on 
M.'^, is given by the second order differential operator 

Note that the Ito-formula implies that for / G C^{R'^), m/ := f{Xt) - /(Xq) - J*^f{Xs)ds 
is an F-martingale. 

The observation processes. The observation is given by the two processes Z and N. 
The process Z satisfies 

Zt= f hiX,)ds + Wt, 0<i<oo, (4) 
Jo 

where /i : M'' — is a measurable function and W is an ^-dimensional standard Brownian 
motion, independent of X. Moreover, the process A'' is a doubly stochastic Poisson process with 
intensity X{Xt) where A is a positive, continuous and bounded function, so that the process 

— J* X{Xs)ds is an F-martingale. We denote the jump times of A'' by ri, t2, 

The objective of nonlinear filtering is to find suitable ways for computing nt{dx), the con- 
ditional distribution of the state Xt given the observation history in a recursive way. More 
formally, let J^f'^ := (t(Z„,A^„ : < u < t), so that the associated filtration F^'^ repre- 
sents the information given by the observation. The conditional distribution of Xt given the 
observation until time t is determined by 

nt{f):^E{f{Xt)\Tf^''), / G 

The following regularity assumptions on the data of the problem will be used throughout 
the paper 

(Al) Assume that the following three conditions hold: 

(i) 6 : M** ^ ]R<^, a : K'* M<^><™, and h-.R^ ^M.'' are bounded on R^. Moreover, b is 
with bounded derivatives and a is with bounded first and second order derivatives. 

(ii) There exists a > 0, such that a{x)z > az^ Vx, z G . 

(iii) A : M'' — > [vui,ru2] is a continuous function for constants < vui < W2- 
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Example 2.1. Filtering problems with diffusive and point process observations arise naturally 
in credit risk modeling. This connection was studied systematically in |Frey and Runggaldier| 
1(2010)1 and [Frey and Schmidt (2012)[ among others. In these papers reduced-form portfolio 



credit risk models are considered where default times are doubly stochastic random times with 
intensity driven by some economic factor process X. In a large homogeneous portfolio the 
number of default events is thus given by some doubly stochastic Poisson process N with 
intensity X{Xt). In line with reality, it is assumed that investors cannot observe the process 
X directly, but are confined to noisy observations of X, modelled by a process Z as in Q. 
Moreover, they obviously observe the occurrence of default events and hence the process N. 

In this context the pricing of credit derivatives naturally leads to a filtering problem, as we 
now explain. In abstract terms a credit derivative with maturity T can be described in terms 
of some -measurable payoff H . Denote by Q the risk neutral measure used for pricing. The 
price of the credit derivative at time t <T is then given by Ht = ¥.^{H \ J-f'^) (assuming zero 
interest rates for simplicity). Using iterated conditional expectations we get 

i/i=EQfE«(i/ I I J-f'"^" 



It is well-known that the pair {X, N) is an F-Markov process. Hence for typical claims H 
one has the equality E''^{H\J^t) = h{t,Xt,Nt) for a suitable function h, and we get that Ht = 



Xt, Nt)\J-f''^) ■ The computation of this conditional expectation is a nonlinear filtering 
problem of the type considered in the present paper. 

For further information on incomplete-information models in credit risk we refer to the to 
the survey article Frey and Schmidt (2011)| 



2.2 The measure- valued Zakai equation 

The evolution equation for the measure TTt{dx) is usually deduced using a change of measure 
method. Define 

At:= n A(X,„-)-exp(^^ h{XsYdWs + \j^ \\h[XM''ds - j^{\[X,) - l)ds 

for t e [0, T]. Then the regularity assumptions in (Al) imply that (A^^)tg[o^T] is a nonnegative 
martingale. We define the measure by its Radon-Nikodym derivative dP" = X^^dV. The 
Girsanov theorem yields that, under P*^, Z is a standard Brownian motion, that is a Poisson 
process with intensity equal to one, and that X, Z and iV are independent. Denote by Yf := 
Nt — t the compensated Poisson process, such that under P°, F is a martingale. Then the 
conditional distribution nt{dx) has a representation in terms of an associated unnormalized 
version p: denoting by the expectation w.r.t. P*^, we obtain by the abstract Bayes rule for 
any / € L°°{W^) 

E°(/(X,)A,|J-f-^) _ ptif) 

"'^^^ EO(A,|.Ff'-) Ml) ■ 

It is well-known that the measure- valued process pt satisfies the classical Zakai equation: let 
Po(/) E[/(Xo)|J-o^'^]. Then, for any / e ^^(M'^), t G [0,T], 

Pt{f)=Po{f)+ f Ps{^f)ds+ f ps{fh^)dZ,+ f p,_(/(A-l))di;, (6) 



a.s., see for instance Theorem 3.24 in [Bain and Crisan (2009) (only continuous observa- 



tions). A formal proof that under (Al), (|6| holds in the setup of the present paper is given in 



Xu (2010) Theorem 2.9. 
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2.3 The Zakai equation for the conditional density 

Our aim is to determine the dynamics of the Lebesgue-density of the unnormahzed conditional 
distribution pt{dx). Consider the separable Hilbert space H = _L^(M'^') with norm || • \\h and 
scalar product (•, •). To obtain intuition, suppose that 

Pt{f) = {qtj) 

for all / e C^(M'^) with compact support and for some i/-valued process q = {qt)Q<t<T such 
that qt{-) can be identified with a smooth function. Denote by the differential operator ^* the 
formal adjoint of the generator As {qt,^f) = {^*qt, f) the measure valued equation (|6| 
simplifies to 



{qtJ) = iqoJ)+ / {^*qsj)ds+ / (/i ' g„ + / ((A - , /)dy,. (7) 

Jo Jo Jo 

This suggests that q solves the stochastic partial differential equation (SPDE) 

dqt = ^*qtdt + h^qtdZt + (A - l)qt-dYt 
in an appropriate sense. The next step is to give this equation a precise mathematical meaning 



using the theory for mild and weak solutions for SPDEs as in Peszat and Zabczyk (2007) 
Besides the Hilbert space H = L'^{W^) wc consider the Sobolev space V = H^{R'^) C H. We 
define an extension A* of ^* with domain D{A*) C F as follows: m g is an element of 
D{A*) if there exists f e H such that for all u e ^ 



and we set A*u — f in that case. If u e Cg(M'^), we obtain that / — by checking that 

(/, v) = (it, I£v) with integration by parts. It is well-known that A^ generates an analytic 
Co-semigroup G*, see Da Prato and Zabczyk (1992, Proposition A. 10) (A Co-semigroup G* is 
a map from [0, T] into L(ff, U) such that G*(0) = id, G*(t + s) = G*{t)G*{s) and so that G* 
is continuous in the strong operator topology.) 

Mild and weak solutions. Let A/'^(0, T; H) denote the set of all F^^^-adapted, i?- valued 
processes ^ = (Ct)o<t<T, continuous in the mean square norm, which are such that 

|C|t:= ( sup E°(\\m\\H)Y' 



It is well-known that Af^{0, T; H) is a Banach space with norm | • \t, see Germani and Piccioni 
1(1987) 



Define the multiplication-operators B: H ^ H\Bf := fhJ and C: H ^ H, Cf := {X- l)f. 
A mild solution of the SPDE 

dqt = A*qtdt + BqtdZt + Cqt-dYf. (9) 
is a process q e JV^{0, T; H) such that 

qt = G,*go + /* Gl_^BqsdZ, + /* G:_,Cg,_ , t< T. (10) 
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Denote by A :~ {A*)* the adjoint operator of A* and note that on Cq{M.'^) the operator A 
coincides with the generator J^f oi X. A weak solution of the SPDE (|9]) is a process q e 
A/'2(0, T; H) such that for all v D{A) 

(gt,i;) = (go,«)+ / {qs.Av)ds+ f {qs,Bv)dZs+ f {qs-,Cv)dYs, t<T. (11) 
Jo Jo Jo 

In our context g is a weak solution of ([9| if and only if it is a mild solution of that equation; 



this follows immediately from Theorem 9.15 in Peszat and Zabczyk (2007) 



The Zakai equation. The following result describes the evolution of the density of the 
unnormalizcd conditional distribution pt{dx). 

Theorem 2.2. Assume that (Al) holds. Then for all q^ £V there is a unique mild solution q 
of the SPDE Moreover, qt E H^{R'^) and for all f £ L'^{R'^) we have that pt{f) = {qt, /). 

In view of this result, equation ([9| will be called the Zakai equation for the unnormalized 
conditional density. 



Theorem 2.2 has been obtained in Pardoux (1979b) and in Germani and Piccioni (1987) for 



the case of pure diffusion information and in Pardoux (1979a)| for the pure Poisson case {h = 0). 



The extension to the case of mixed observations may be found in Xu (2010) 



2.4 The Galerkin approximation 

The Galerkin approximation for a (stochastic) PDE essentially projects the equation to a finite- 
dimensional subspace. In the case of the Zakai equation for the unnormalized conditional density 
the solution of the projected equation can be characterized in terms of a finite-dimensional 
system of ordinary stochastic differential equations (SDEs), as we now explain. 

Formally the Galerkin approximation is defined as follows: Let {ei, 62, . . .} C D{A*)r\D{A) 
be a basis of the Hilbert-space H. Let Hn be the linear subspace spanned by {ei, . . . , e„} and 
denote by P„ the projection from H to iJ„. We define the projection of the operator A* by 



n J 



(^*)(«) P„A*P, 
the operators /B'^"-' and C*^"^ are defined analogously. 

Definition 2.3. The n-dimensional Galerkin approximation of (|9| is the solution of 

dqf) = {A*y^^Ui"'^dt + B^^Ui"''^dZt+C^"'>qt^dYt, 

(") p ^^^^ 

qo = Pnqo- 



As previously, there are two equivalent concepts of solutions. The mild solution of ( 12 ) is 
obtained with (G*)'^"-' := exp(^*)("^. On the other side, the weak form is obtained using the 
adjoint operator := ((y^*)("))*. Since for u,v & H one has {PnA*PnU,v) = {u, PnAPnv) 
the weak form of the Galerkin approximation ( |12p becomes 

d(gf t;) = {ql'^\ P^APnv)dt + (qt, PnBPnv)dZt + {qt-, PnCPnv)dYt , veH. (13) 

Note that for v e we obtain that the differential d{ql^\v) is equal to zero. Since 
moreover q^^^ = Pnqo G P^n it follows that q^^^ G Hn for t G [0, T] P-a.s. Hence, g^"-* can be 
written as 

n 

(Z^)(x)=^Vf^(i)e.(a:), te[0,T], (14) 
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where ^^"^ , 1 < i < ^t- are called Fourier coefficients. Plugging ( 14 1 into the weak form of the 



Galerkin approximation (13), we get that the Fourier coefficients satisfy the following system 
of ordinary SDEs: 

n n In 

2^1 i^l 1^1 i^l 

n 

+ [Y.^t'\t-)[e,,{\~l)e,))dY,. 



Define the n x n matrices A, C, D and B ,£ = 1, . . . , / by their components: 

a-ji := {ei,Aej), 6^,^ := {ci^h^Cj), cji := (e;, (A - l)ej), dji := {ei,ej). 



(15) 



As {ei, 62, . . . } is a basis of H, the matrix D has full rank and is invertible. Using matrix nota- 
tion we obtain the following SDE system for the vector- valued process T*^"^ := (?Ai"\ . . . V'n"'')^, 



(") 



-(") 



D-UAT[''^dt 



Y^B'T^r^dz', 

1=1 



(16) 



This SDE system will be the starting point for our numerical analysis in Section [4j Note 
that for {ei, 62, ... } smooth, one has aji = {ei,^ej) which is more convenient for computing 



the coefficients of the system (16). For the case without point-process observation the SDE- 



system 16 was already proposed by Ahmed and Radaideh (1997) 



Moments of the conditional distribution. Obviously, the (normalized) conditional 
density of nt{dx) can be approximated via 



Pt 



in) 



J^,qtix)dx J^^ q[^''\x)dx 



--■■Pi 



in) . 



(17) 



here « means that we approximate the term on the left side by the Galerkin approximation 
on the right side. In this case we have that E{f{Xt)\J-f''^) ~ {p[^\f). On the other side, we 
can represent some characteristics of the conditional distribution directly via qt. Consider for 
simplicity the case d = 1. Denote by Xt and CTj be the conditional mean and variance of the 
state process at time t S [0,T]. Then 

Xt =E(Xt|J"^'") = /^*(^)^^ ^ Jxq'i"'\x)dx _ ELi V'l"'(0(a;,ej) 



Jqt{x)dx Jq^\x)dx Er=i#^W(l,e.)' 



(18) 



Note that the second equality follows from the definition of the unnormalized distribution, see 
For the last equality we used (14). In a similar way we approximate in af = E((Xt — 
Xt)'^\J-^'^) = E{Xf\T^'^) — {xtf the conditional second moment by 



E^=lV'^^w(x^e.) 



(19) 



Analogously all moments of the conditional distribution can be represented by the Fourier 
coefficients. Notice that (1,6^), (x, e^) and (a;^,ei) are independent of the observation and 
can be computed off-line (we implicitly assume that these integrals exist for the chosen basis 
functions). 
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3 Convergence results 



This section gives sufficient conditions for the convergence of the Galerkin approximation q^") 
defined in ( 14 1 to the solution of the Zakai equation q from ^ in an appropriate sense. The 
following theorem is the main theoretical resufi of the paper: 

Theorem 3.1. Assume that (Al) holds. Let q he the solution of the Zakai equation in ([9| and 
he the corresponding Galerkin approximation. Then, for any qo G V , 



sup E'iU-^ 
te[o,T] 



Qt\\H) 



0, 



as n 



oo, 



if and only if, for any x ^ H , 



lim sup 



' eMPnA* Pnt) - G;)x 



0. 



H 



(20) 



Note that G^x is the solution of the Kohiiogorov forward PDE with initial condition x (the 
PDE describing the evolution of the transition density of X) and exp(P„y^*P„t)a; is the Galerkin 



approximation to this (deterministic) PDE. Hence Theorem 3.1 shows that the Galerkin ap 



proximation for the Zakai equation converges if and only if the Galerkin approximation for the 
deterministic forward equation converges. 



Necessary and sufficient conditions for ( 20 ) to hold can be obtained by means of the Trotter- 



Kato theorem. A convenient condition that ensures (20) under (Al) is that 



\^ Hn is dense in V; 



(21) 



see Theorem 4, Germani and Piccioni (1984) 



Proof of Theorem 13.11 



The remainder of this section is devoted to the proof of Theorem 3.1 The essential part of the 
proof is a continuity result for the mild form of the Zakai equation, see Proposition |3.4| below. 



This result is an extension of a result from Germani and Piccioni (1987) where the case of 
continuous observation is treated. We recall the mild form of the Zakai equation in the Banach 
space M\0,T;H), qt = G^o + f^G^.BqsdZs + Gt_,Cq,-,dYs, t < T where for f e H, 
Bf = h^f and Cf={X- l)f. 

We start by introducing some necessary operator spaces. By S we denote the space of all 
Co-semigroups of linear bounded operators from H to H such that there exists S E M+ with 



sup \\St\\ <S foraU^e^. 

te[o,T] 



(22) 



We endow S with the topology of uniform strong convergence on [0,T], i.e. a sequence (S'*^"-') 
in S converges to 5 G 5 if for all x £ H 



lim sup 



{si-^-S,)x 



= 0. 



H 



For any I e N denote by iY' the space of linear bounded operators from H to (/-fold 
product of H). In the special case / = 1 we write U =U^. An operator AeU^ can be written 
component-wise: for all x G H, 



Ax={A^x,...,A^x)'^ 
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with A' e U. The space U'' is endowed with the strong topology, that is a sequence (A^")) in 
converges to A eU\ if for all x £ H 



hm 



= 0. 

H' 



The studied SPDEs. For the proof we study a more general class of linear stochastic 
partial differential equations that includes the Zakai equation ( 10 1 as a special case. Consider 
a generic semigroup S £ S and generic linear operators B £U\ C (zU and some f £ H. In the 
sequel we study the following equation in Af'^{0, T; H): 



Jo Jo 



(23) 



The following decomposition of this equation is the starting point for our analysis: define 
the linear operator L on Af^^O, T; H) by 



mm-.^ I St-sB^sdZs+ f St-.C^s-dYs (24) 
Jo Jo 

for all t G [0,T] and ^ € H. Furthermore, set ^f' := Stf such that e Af'^{0,T; H). We 
obtain that ( [23| can be rewritten as the following equation in A/'^(0, T; H) 

e = e'"' + Lc (25) 

The operator L is a bounded linear operator and it is moreover quasinilpotent, as the following 
estimate shows. 

Lemma 3.2. Set VTS{\\B\\^ + \\C\\^)^ . Then, for all n e N 

7 
(n!) 

The proof is given in Appendix \K\ 
Lemma 3.3. Equation (|25[) has a unique solution in Af^{0,T; H), 



i"ll"<^4i- (26) 



e = (/-i)-ie[°l:=E^"^'"'' (27) 
and {I~L)~^ : N'^{Q,T:H) Af^iO^T; H) is a bounded linear operator: \\{I - ly^W < k. with 

Proof. The crucial part in the proof of the lemma is the estimate 



Ew'.i:^^i:.-'|f.f(f.-)(£'^^> 

n— n— ^ ^ n— ^ ^ 



2n 



n=0 



which shows that the Volterra series X]r=o does in fact converge as n — > oo. □ 
In view of Lemma 3.3 we can define the mapping F: HxU^xUxS—?' Af'^{0, T; H) by 

F{f,B,C,S) 

where ^ is the unique solution in Af^{0, T; H) of ( [23| with coefficients (/, B, C, S). The following 
result shows that F is continuous. 



9 



Proposition 3.4. Consider sequences (/(")), (S'")), (C*")) and (S-f")) in H, U\ U and S, 

converging to f E H . B C eU and S £ S, respectively. Then, 



F{f<^''\ s-H) _ F{f, B, C, S) 



—^0, as n — >■ oo. 



Proof of Proposition \3.4\ Since 5^"^ -> 5 , B^"'> — > B and C^"^ -> C, by the uniform bounded- 
ness principle there exist N and a constant 7 such that 



sup 

t6[0,T],Ti>iV 



{||5,|| V V V V ||C|| V < 7. (28) 



In the following, we only consider sufficiently large n > N. Set 

£.:=F{f,B,C,S), :=F(/("),B("),C("\S'(")). 
Together with := S'i"^^"^ we define L(") by 

f st\B^-\^,)dZs+ f St\C^-H^,.)dY, 
Jo Jo 

for all ^ e 7V^(0, T; i/). Then, by the very definition of F, 
Hence 



By Lemma 3.3 there exists a constant k — k(7), such that 



(29) 
(30) 



Furthermore, as S*-"^ £ 5, 

|^[0,(")] _^[0]|2 



— sup 
te[o,T] 

<2 sup 

tG[0,T] 



5(n)^(„) _ 



+ 2 sup 

H te[o,T] 



<2f ||/(«)-/||2, + 2 sup (5^^-50/ 

te[o,T] 



2 



H 



The last term converges to zero as (Z^"-*) and (S**^"^) converge to / and S, respectively. 

Finally, we show that (L(") - L)^ converges to zero. From the definition of L and i^"^ we 
obtain by the Ito-isometry that 



|(L(")-L)^||= sup E"(||((i(")-L)e)(t)||l, 
te[o,T] ^ 

sup E"( f {stlB^-'^ - St-sB)t 

E[0,T1 \J0 



te[o,T\ 
< 2 



2 




ds + 


f 




Jo 



sup E° 
te[o,T] 



5(:!i(B(")-B)6 ds)+supE°(/ {Sti-St-s)Bi 



te[o,T] ^Jo 



ds 



H' 



+ sup E"( / stW''^ - C)^s ' ds) + sup E"( / [sill - St-s)C^s ' ds) 



te[o,T] ^Jo 
2(i;i+i;2 + -B3 + -B4). 
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We consider the terms Ei to E4 separately. Observe that by (22 1, 



As B*^"-* converges to B, we have for all t e [0, T] and cj e il 



dT 



In order to show that E'l — >■ as n — > 00 we apply dominated convergence. Since ||B|| < 

7 we get 



and the last term is integrable since jj" IICslll/'^s) ^iClg 
In a similar way 



< 00. 



E2 < sup E° ( / sup 



te[o,T] 



s<T<T 



H' 



ds 



=E' 



'( r sup - 

^Jo s<T<T 



ds 



H' 



while uniform strong convergence of S'^"-' gives supj.<^<7n 

w G r2. As 



H' 



(31) 



-> for all 



E° 



/"^ sup - Sr-s)B^s ' ds) < 4f E°( UsWUs) < ^I'm 

Jo s<T<T W / \Jq / 



and \^\t < 00 by Lemma 3.3 we obtain again by dominated convergence that E2 — 0. Analo- 
gously we obtain E3 ^ and i?4 — > and we conclude. □ 

Finally we turn to the 



Proof of Theorem 3.1 Under Condition (201 the assumptions of Proposition 3.4 are clearly 
satisfied for the Galerkin approximation of the Zakai equation, as PnX — > x, PnBPnX — Bx 
and PnCPnX — >■ Cx for all x E H. For the proof of the converse statement (the fact that (20 1 
is also necessary for the convergence of the Galerkin approximation) we refer to the proof of 
Theorem 6.1 in Germani and Piccioni (1987) □ 



4 Numerical methods 



In this section we discuss various aspects of the practical implementation of the Galerkin ap- 
proximation for the Zakai equation. We begin with a few algorithms for the numerical solution 
of the SDE system (16 1. In Section 4.2 we consider the special class of basis functions con- 



structed from Hermite polynomials. In Section |4.3| we finally show that the efficiency of the 
Galerkin approximation can be improved substantially if the scale and the location of the bases 
are changed adaptively. 
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4.1 Numerical solution of the Zakai equation 



In order to solve the SDE system in ( 16 ) numerically, we discretize the system in time. As nu- 
merical schemes we consider the Euler-Maruyama method and the splitting-up method. While 
the Euler-Maruyama method is fast to implement, it can become quite unstable if the time step 
is relatively large (see Figure |3|. This difficulty can be overcome with the splitting-up method. 
Note that in practical filtering problems the observation often comes at discrete time points, so 
that the time-discretization step can not be chosen arbitrarily small. 



Our aim is to approximate Equation (16). It will be convenient to use the process N as 
driver (instead oiY = N — t). Rewriting Equation (16) leads to 

I 

dr[") =D-^(^{A-C)T'i''Ut + Y,B'T['''^dZt + CTt^dNty T^"^ = (7^"\ (32) 

Consider some the equidistant partition = to < < • • • < ^a' = with step size A := T/K. 
In the sequel we consider n and the partition fixed and denote the approximation at the time 
point tk,0<k<K, by Tfc = {'4>k,i, V'fe.n)'- 

Euler-Maruyama method. The Euler-Maruyama method (EM method) generalizes the 



Euler method to stochastic differential equations, see e.g. McLachlan and Krishnan (1997) It 
is described in the following algorithm: 

Algorithm 4.1 (EM method). For k = 1, . . . , K , compute from Tfc_i by 

I 



Splitting-up method. The splitting-up method (SU method) is a numerical method based 
on semigroup theory. It decomposes the original SDE into stochastic and a deterministic equa- 
tions which are easier to handle. We refer to Bensoussan, Glowinski, and Rascanu (1990) and 
Le Gland (1992) [ for further details in the case of continuous observations. Here we propose an 



extension of the method to the case with mixed observations. For simplicity, we assume that 
the basis {ei, 62, ... , e„} consists of orthonormal functions so that D = — In- 

Intuitively, the SU method computes from Tk-i in three steps: the first step uses only 
the dt-part of equation (32) and returns the solution of the SDE dT} = {A — C)T\dt. The 
solution of this equation on [tfc-i,tfc] is the matrix exponential T^^ — exp {{A — C)A)T(^_^. 
Step 2 incorporates the new information from Z via the linear SDE dT^ = BT^dZt with initial 
condition — T]^. The solution of this SDE is given by the matrix exponential 

= exp f ^ {b\Z,, - Z,,_J - l{BrA))Tl 



The new jump information is incorporated via the linear equation dT^ = CT^_dNt, this time 
with initial condition Tj _ — Tj , which gives 



" tk-i tk 

T?, = (/„ + C)(^'^-^*-^)Tt 
These steps lead to the following algorithm: 

Algorithm 4.2 (SU method). For k — 1, . . . , compute from Tfc_i by 

(1) Compute Tl := exp {{A - C)A)Tfe_i. 

(2) Compute := exp ( - J - ^(S')'A))Ti. 

(3) Return := (/„ + C)'^'^'^-'^'^-^Hl 
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4.2 Galerkin approximation based on Hermite polynomials 

The choice of the basis functions has a large impact on the quahty of the Galerkin approximation. 



Ahmed and Radaideh (1997) propose to use Gaussian series, i.e. a series build by densities of 
n-dimensional Gaussian distributions with different means and arbitrary positive, symmetric 
covariance matrices. It is shown that these are linearly independent and complete and hence 
they can be used to construct Galerkin approximations as described above. 

In this paper we instead consider a basis computed from Hermite polynomials. This basis 
has a number of computational advantages over Gaussian series as will become clear below. We 



start by recalling some properties of Hermite polynomials, see e.g. Courant and Hilbert (1968) 
The ith Hermite polynomial is defined by 

/,(x) = (-l)V-'/2^e--'/2, xeR (33) 

with i = 0,1,2, It follows that /o(a;) = 1, fi{x) = x, f2{x) = x"^ — 1 and /3(a;) — x^ — 

Zx. The Hermite polynomials are orthogonal with respect to the weighting function (j}{x) := 
(27r)~^/^ e""" as /jj fi{x)fj{x)(l){x)dx — i\ Consequently, the functions ei, 62, . . . given 

by 



e.(a:):=^^^^/.-i(a:), xGR, (34) 
constitute an orthonormal basis of (M) , which we call Hermite basis. In the following result 



we deduce the convergence of the Galerkin approximation with the use of (21 ). 

Proposition 4.3. Assume that (Al) holds and that g^"'' is the Galerkin approximation of the 
Zakai equation with respect to the Hermite basis. Then, for any qq G V , 

sup E°(||gt^"^ -qtlll^) ^ 0, OS n^oo. 

te[o,T] 

The proof is given in Appendix lAl To actually obtain the Galerkin approximation under 



the Hermite basis one computes the coefficient matrices A, , . . . , B'' , C as in ( 15 1 with respect 



to the Hermite basis (the matrix D is the identity matrix as the Hermite basis consists of 



orthonormal functions) and then solves ( 16 ) numerically by one of the methods described before. 
In some special cases it is even possible to obtain explicit formulas for the entries of the coefficient 
matrices, as is illustrated in the following example. 

Example 4.4 (Kalman filter with point process observations). Consider d = m = I = 1 and 
assume that b(x) — bx, that 17(2;) = a and that Xq ^ ctq), so that 

= + / bXsds + aVt 







is a linear Gaussian process with generator ^/(x) — bxf'{x) + ^f"(x). Assume moreover that 
h and A are of the form h{x) — hx and A(a;) = Xx^ with A > such that the observation is 
given by Zt = hXgds + Wt and by the doubly stochastic Poisson process TV with intensity 
{\X^)t>o. In the next lemma we give explicit formulas for the coefficient matrices A, B and C . 
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Lemma 4.5. Consider Then with h{x) ~ hx and \{x) = \x^ we obtain 



+ ^{l-2i) fori=j 



a,^^{e,,^e,) = {{-I + ^)^/jUTT) fori=j + 2 (35) 



'b 

-.2 ' 8 



I u(\ \ \^^^ fori=j + l 

bj, = {e„h{-)ej) ^ i (36) 

[hy/j-1 fori^j-1, 

'A(2j-1)-1 fori^j 
(e„(A(-)-l)e,)-<'Av/J(7TT) fort=j + 2 (37) 



.AV0'-l)(j-2) fort=j-2, 



and zero for all other cases. 



We give the proof in the Appendix In order to set up the Galerkin approximation one 
moreover needs to project qq on the subspace 77„ generated by the Hermite basis. This is done 
with some additional notation in Lemma 14.81 below. 

We mention that explicit computation of Oji = {ei^Jifej) is possible also for other state 
processes with different generator such as a CIR process. 



Computation of moments. Recall from ( 18 1 that in order to compute mean and variance 



of the filter distribution via Galerkin approximation one needs to determine be integrals {x^ , Ci). 
For the Hermite basis this can be done analytically. In order to present the corresponding 
formulae we introduce some additional notation. The z-th Hermite function is an polynomial of 
order i, and we denote by z?q, . . . , i?* the coefficients in the representation fi{x) = X)fc=o ^fe^'^- 
Conversely, any power of x can be represented as linear combination of Hermite polynomials 
and we write — J2k=o '-kfkix)- Now we have 

Lemma 4.6. For the Hermite basis it holds that 

(-^eO = 4^£^■^'2nr^ J =0,1,... (38) 
V k=o 



The proof of this lemma is given in Appendix |A] 



Example 4.7 (Example 4.4 continued). Using the above notation we may also compute the 



projection of the initial density for the case of the Kalman filter with point process observation: 
Lemma 4.8. Set a := b := x/l^ and d := |4 - Then 



k — m i^rn m 



We give the proof in Appendix |^ 
4.3 The adaptive Galerkin approximation 

During the filtering process the conditional distribution TTt{dx) typically changes location and 
scale. This can create problems for the Galerkin approximation with a fixed basis. For instance, 
the graphs in Figure [T] show that while the standard Hermite polynomials do approximate the 
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-10 








6 a 



Figure 1: Comparison of tlie density p of a normal distribution with mean /i and variance with 
its approximation p = for different choices of /i and a with n = 20: the graphs show 

the distance d :— (J {p~p)^dxY^^ as function of ^ and as function of a with fixed cr = \/2 (left) and 
= (right). The approximation is bad if ^ (—5,5) (left) or cr ^ (0.9,2) (right). The adaptive 
Galerkin method overcomes this difficulty. 



density of a normal distribution well if the mean is close to zero and if the variance cr^ lies 
between one and four, the fit becomes substantially worse if /i is substantially different from 
zero or if is outside of the interval [1,4]. Hence we propose an adaptive scheme, called 
adaptive Galerkin approximation (AGA), which improves the numerical performance of the 
Galerkin approach significantly. 

Assume for simplicity that I = 1 and that the basis {e^} C D{A*) of H consists of or- 
thonormal functions. We consider the equidistant time discretisation given by tk = kT/K, 
k = 0, . . . , K . The standard Galerkin approximation computes at each time tk- The AGA 
additionally adapts the location fik and the scale tJfc > of the basis by choosing appropriate 
values for these parameters at every time step. Hence the method works with the adapted basis 
{ei',e^,...} given by 



1 



(39) 



Similar to (15), we denote by A , B , C and D the matrices given by 



4, = (e^^e^), b^^ielhe^), ^ = (ef , (A - l)ep, 4 = (e^e^). (40) 

In algorithmic form the adaptive Galerkin approximation can be described as follows: 
Algorithm 4.9 (AGA). 1. Initialization: 

1/2 

i) Set fiQ and (Tq using the initial density: fio = J xpo{x)dx and co = ( / {x — i^o)^Po{x)dx) , 
and define the basis functions e°, 1 < z < n, as in (39). 



u) Compute A°, B°, C° and D° according to Q. 
in) Compute Tq = (V'o.i, ■ • ■,i>o,n) by (Im]): Vo.i = (Po,e°). 
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2. Iteration: For k — I, . . . , K do the following steps. 



i) Compute — {ipk,!, ■ ■ ■ , 4'k,n)' from Tfc_i applying Algorithm 4.1 or 4.2 using the basis 
functions e^^^ , 1 < i < n. 

ii) Compute the following estimates of the conditional mean and standard deviation: 



If |a;^"^ — and |<5'^"'' — crfc_i| are smaller than a given threshold, set = T^;, 

Mfe — fJ-k-i, — CTfc-i, e*^ := e*^^^, 1 < i < n. Let k = k + 1 and continue with the 
iteration (Step 2). 

iii) Otherwise do a transition of the basis as follows: let /ifc := cr^ := o'^"'', define the 



new basis functions as in (39) and compute the matrices A'^, B'', C*' and according 
to (40 1. Finally, compute by projecting qt^ — X]"=i V-'fc.jej'^^ on the new basis: let 



^fe,i = (^tfci ef ) and set = (V"*:,!, . . . , V'fc.n)'- Let fc = fc + 1 and continue with Step 2. 

The AGA provides better results compared to the standard Galerkin approximation (see the 
numerical experiments in Section [5] below) while it is typically more time consuming since the 



coefficient matrices in (40) need to be recomputed at every transition of the basis. However, in 
the AGA with respect to the Hermite basis the corresponding terms can be computed explicitly 
which leads to an efficient implementation of the AGA. In particular, when the coefficients 6, 
cr^, h and A are of polynomial type, the corresponding coefficients can be computed explicitly 
with the aid of Lemma 14.61 



4.4 The multi-dimensional case 

In this section we shortly sketch the extension to the multi-dimensional case. For an introduction 
of multi-dimensional Hermite polynomials, we refer to Berkowitz and Garner (1970)1 Here, we 



proceed by the following method: let {ei, 62, ... } denote the Hermite bases defined in Equation 
( 34 ) . This constitutes a basis of L'^ (M) . Hence, 



jcij (g) (g) • • • (8) e^^ : ii, 12, ... id £ n| 

is a Hilbert basis of L^(K'^), where the tensor product is defined by [e® f){xi,X2) ■— e{xi)f{x2)- 
We choose the following m = n'^-dimensional subspace iJ^ 

Hm = spanjeii (g (g) • • • (g) e.j^ : «i,i2, . . .id € {I, ■ ■ ■ ,?^}|. 

From now on the (adaptive) Galerkin approximation works similarly as in the one-dimensional 



case, compare for instance Algorithm 4.9 



5 Numerical Experiments 

In this section we present results from a number of numerical case studies. The aim is to assess 
the performance of the Galerkin approximation relative to other methods (mostly particle filters) 
and to illustrate various practical aspects of the method such as the pros and cons of different 
basis functions and discretization schemes. 
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General description. The basic setup of each numerical experiment is as follows. In Step 1 
a trajectory x = {xt)o<t<T of the signal process ([2]) was generated using the Euler-Maryuama 
method. In Step 2 we generated for the given trajectory x from Step 1 a trajectory z of the 
continuous observation Q and a trajectory n of the point process observation N. In Step 3 
various variants of the Galerkin approximation were used to solve the corresponding Zakai 
equation for the conditional filter density. For comparison purposes the filter problem was also 
solved using a particle filter. 

The performance of the numerical filtering algorithms was assessed in different ways: 

• By design, the mean xt of the filter distribution at time t minimizes the i^-distance 
between the unobserved state Xt{u!) and the space J"f'^,P). This suggests the 
following performance criterion: Generate m independent trajectories , , 1 < j < 
m and solve numerically the ensuing filter problem for discrete time points ti, . . . ,tK- 
Compute the so-called root mean square error (abbreviated RMSE) given by 

^ m K 1 

]=i fc=i 

Obviously, a filtering method that leads to a smaller RMSE can be considered to be more 
accurate. 

• We can plot individual trajectories x, x and a'^ of the signal, and of the conditional mean 
and variance of the filter distribution. This allows for a pathwise comparison of different 
numerical methods. 



Finally, in some cases (e.g. Kalman-filtering for linear Gaussian models) the filter den- 
sity is known explicitly. In those cases we can compare the filter density pt(-) and the 
approximation p|"-'(-) that is obtained by normalizing the numerical solution of the Zakai 



equation (see (17)) 



Our numerical experiments with a one-dimensional signal process use the setup of Exam- 



ple 4.4 In this case X is a one-dimensional Ornstein-Uhlenbeck process with mean-reversion 
parameter b and volatility a and is a doubly stochastic Poisson process with intensity X{Xt). 
The functions h{-) and A(-) are of the form h{x) = hx and A(a;) = Aa;^. The parameter values 
are as follows: b — 0.5, a — 2, and the initial distribution of X is normal with /i = 5, cr^ = 0.01. 
Unless stated otherwise we took a time step A = 10^^ (essentially continuous observations). 
The values of h and A vary with the experiments and are hence given in the captions of the 
graphs and tables. 

We also considered the case of a multidimensional signal process of dimension d = 5. We 
assumed that the signal process has dynamics dXt = bXtdt + adVt for a 3-dimensional Brownian 
motion V. The observation process is three-dimensional and h(x) — hx. The matrices b, a and 
h are as follows: 
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0.2 0.3 0.2 0.3 0.4 
0.2 0.1 0.2 0.1 0.2 
0.2 0.2 0.4 0.2 0.2 



and iV is a one-dimensional doubly stochastic Poisson process with intensity ^.\{X\)'^^Q.2{jCIY' 
^.^{XlY + ^■^{Xff' -I- O.l(Xj^)^. The basis functions were chosen as described in Section [44 



Results. In the following we summarize the key findings from our numerical experiments; 
the outcome of each experiment is described in detail in the captions of Figures |2] - Figure [7] 
below. 
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(i) For a one-dimensional state variable process the adaptive Galerkin approximation per- 
forms very well: given a sufficient number of basis functions tlie precision is equal to the 
precision of a particle filter, but the computation time is significantly lower. This can 
be seen from inspection of Table [T] where we give the RMSE and the computation time 
for various filtering algorithms and parameter values. The performance of the Galerkin 
approximation and the additional value for the estimation of X obtained by incorporating 
the observation of the point process N is further illustrated in Figure [2] In Figure [3] we 
consider the special case of the Kalman-Bucy filter (no point process observations) . Here 
the filter density is known explicitly and we can compare the approximation obtained via 
Galerkin approximation to the correct density. The figure clearly shows that the Galerkin 
approximation provides a good approximation to the overall density (and not just to the 
conditional mean xt). 

(ii) The adaptive Galerkin method can bring a substantial performance enhancement if we 
consider examples with small observation noise and hence with rapidly moving scale and 
location of the filter distribution. This is illustrated in Figure [5] 

(iii) While computationally more involved, the splitting-up approximation is significantly more 
accurate and stable if the time-discretization step A is moderately large, as is clearly shown 
in Figure [6] At this point we would like to stress that in many applications of filtering, 
observations arrive at discrete time points such as daily observations, and that one resorts 
to continuous-time filtering methods merely for convenience. This implies that A cannot 
be freely chosen by the analyst and it is important to have numerical methods that are 
robust with respect to the choice of A. 

(iv) Figure [t] for the case d — 5 indicates that the Galerkin approximation works reasonably 
well also for a higher dimensional signal process. However, the number of basis functions 



increases exponentially in d (at least for the basis chosen as in Section 4.4). It would be 
interesting to see if a further performance enhancement is possible if we choose a different 
basis, but this is left for further research. 

Comparison of the Gaussian and the Hermite basis functions. Finally we 
briefly discuss the pros and cons of Gaussian and Hermite basis functions. 

The main advantage of the Hermite basis is clearly the fact that the form an orthonormal 
system. This facilitates the numerical implementation of the method, as there is no need to 



invert the matrix D introduced in (15). Moreover, the orthogonality of the implies that 
the subspace spanned by the first n Hermite basis function is in some sense 'larger' than the 
subspace spanned by the first n Gaussian basis functions, so that it is possible to obtain a 
similar level of accuracy with a smaller size of the basis. Table [2] shows that, at least for 
the case of the Kalman filter with point process observations, these advantages are not only 
theoretical: in order to reach a similar level of accuracy the methods based on Hermite basis 
functions require a substantially lower computational time than the algorithm with Gaussian 
basis functions. A second advantage of Hermite polynomials is the fact that for certain models 
it is possible to compute the coefficient matrices for the Galerkin approximation analytically, 
as was illustrated in Example |4.4| This is particularly useful if one wants to use some form 
of the adaptive Galerkin approximation, since in that case the coefficient matrices need to be 
recomputed during the filtering procedure. 

As mentioned in Ahmed and Radaideh (1997)[ the use of Hermite basis functions might in 



principle lead to negative values for the conditional density which is clearly undesirable. To this 
we mention that we never encountered the problem in our numerical experiments unless we took 
the number of basis functions extremely small. But we readily admit that in order to exclude 
the possibility of negative filter densities a priori, one has to resort to a set of nonnegative basis 
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Na/Np 


5/20 


10/50 


15/100 


AGAH(EM) 


0.63 (0.1s) 


0.42 (0.1s) 


0.42 (0.1s) 


AGAH(SU) 


0.65 (2.4s) 


0.43 (3.1s) 


0.43 (3.9s) 


PF 


0.46 (9s) 


0.46 (22s) 


0.42 (46s) 



Table 1: Performance comparison for different filter algorithms: we plot the RMSE and in 
brackets the computation time for two Galerkin filters and a particle filter. Here Nq represents 
the number of basis functions in the Galerkin approximation and Np the number of particle 
in the particle filter. AGAH(EM) respectively AGAH(SU) stands for the adaptive Galerkin 
approximation with respect to the Hermite basis, where we used the Euler-Maruyama (EM) 



approximation and the splitting- up approximation (SU) for Equation (16), respectively. The 
parameter values h — \ — Q.\ used in the experiment correspond to a relatively uninformative 
observation filtration; in computing the RMSE we used m = 100. 



functions such as the Gaussian basis proposed by Ahmed and Radaideh (1997)[ even if this 
might lead to a higher computational effort. 
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0.05 0.1 0.15 0.2 0.25 + 0.05 0.1 0.15 0.2 0.25 



Figure 2: Illustration of filtering and the value of point process information. We choose h = 5.5, A = 10. 
Left: trajectories of X and of x using the Galerkin method and particle filtering. Both methods perform 
well. In the right graph we illustrate the gain of using point process observations: we plot the trajectory 
of the conditional standard deviation at for the case with only continuous observation A = and with 
continuous and point process observations (A = 10, lower trajectory). The approximation by the two 
methods are very close in that case. Clearly, including point process information reduces the conditional 
standard deviation significantly in this example. 



Filter densities 



Differences 





Figure 3: Comparison of the theoretical filter density and the AG AH. We consider an n- dimensional 
Hermite basis and the case of purely continuous observations (A = 0). In this case, the filter problem 
has an explicit solution that can be computed with the Kalman-Bucy filter (KEF). Left: filter densities; 
right: approximation error. The adaptive Galerkin approximation (AGAH) is very close to the explicit 
solution for rt > 8. 
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5 1 , 1 , , , 1 Ql-" • • • • ' 

0.2 0.25 0.3 0.35 0.4 0.45 0.5 + 0.1 0.2 0.3 0.4 0.5 



Figiirc 4: Filter estimate (left) and conditional standard deviation (right) for a varying number n of 
Hermite basis functions in the adaptive Galerkin approximation (AGAH). Here h = 5.5, A = 10 and we 
use the AGAH with n = 4 and 8 basis functions. The case n = 4 shows a bad performance; the filter 
with 8 basis fimctions performs reasonably well. The right plot indicates that plots of the conditional 
variance can be a useful tool for determining if the number of basis functions used is appropriate. 




Figure 5: Comparison of the ordinary Galerkin approximation (GAH) with the adaptive Galerkin 
approximation (AGAH). Both approximations work with n = 20 Hermite basis functions. In this 
example h = 20, A = 10 (these parameter values correspond to a very low observation noise). Note that 
the ordinary Galerkin filter performs poorly, whereas the AGAH performs reasonably well. 
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Splitting-up approximation 



Euler-Maruyama approximation 




Figure 6: Comparison of splitting-up approximation (left) and Euler-Maruyama approximation (right). 
In this figure, we compare the rcsuhs obtained by these two methods for different A. The results 
obtained coincide for A = 10^"'', but the splitting-up approximation is more accurate when A is large. 
In particular, the splitting up method provides a good estimate even for A = 10~^. 




Figure 7: The multidimensional case. Comparison of the adaptive Galerkin approximation (AGAH) 
with the particle filter (PF) for the case of a multi-dimensional signal process X. We show plots of the 
conditional mean for a basis of size n = 4P = 1024. For the particle filter we took 10"^ particles. The 
computation time was 12 seconds (PF) and 9 seconds (AGAH). The results obtained by both methods 
are very close to each other. 
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n Time 



RMSE s.e. 



EDM s.e. 



EDV s.e. 





1 nn /II c 
iUU 41. o 

500 183 

1000 366 


U.iDoO U.4 m4 
0.1518 0.4285 
0.1530 0.4325 


U.Ui^y U.UbZi 
0.0039 0.0166 
0.0024 0.0109 


n nriQ n ni oo 
U.UUo U.UiZZ 

8e-04 0.0037 

4e-04 0.0016 


GAG 


25 10.9 
30 16.7 

50 46.8 
120 927 


0.3869 1.2997 
0.1632 0.4445 
U.iD4i U.44yo 
0.1850 0.6182 
0.1580 0.4276 


0.2316 1.0390 
0.0151 0.1061 
U.Uioy U.izoo 
0.0400 0.4320 
0.0075 0.0221 


0.6159 2.5379 

0.0030 0.0959 

n art A a f\ ^ qqk 
U.UU44 U.iooD 

0.2659 4.5433 

5e-04 0.0042 


PAW 


o i .y 
10 2.0 
12 2.1 
24 3.7 
40 7.2 


y.lDZD ZZo.oU 

2.0444 42.221 
0.6722 3.4018 
0.1494 0.4163 
0.1494 0.4162 


y.iooi Zoi.OO 

1.8825 42.117 
0.5415 3.1160 
0.0005 0.0019 
0.0004 0.0015 


n n/i'^s 1 9988 

U.U400 l.ZZoo 

0.1273 6.3878 
0.0045 0.0932 
0.0001 0.0017 
8.7e-05 0.0004 


AGAR 


8 23.3 
12 32.7 
16 50.8 


0.1518 0.4232 
0.1494 0.4193 
0.1493 0.4193 


0.0007 0.0029 
0.0006 0.0022 
0.0006 0.0023 


0.0002 0.0009 
9.3e-05 0.0004 
9.8e-05 0.0004 



Table 2: Performance of the different algorithms in terms of root mean square error (RMSE). 
We consider the particle filter (PF), the Galerkin approximation with Gaussian basis (GAG), 
the Galerkin approximation with Hermite basis (GAH) and the adaptive Galerkin approximation 
with Hermite basis (AGAH). The chosen parameter values are h = 5.5, a = l,and A = 10 with 
inital distribution Af{2,l) (as in Figure [5| and we consider the time-interval [0,0.5]. Besides 
computation time and RMSE we plot the estimated deviation |^ in mean (EDM) and estimated 
deviation in variance (EDV) by comparing the method to the solution of a particle filter with 10** 
basis functions and their standard errors. A low value suggests that the method is very close to 
the exact solution. The methods with the Hermite basis (GAH, AGAH) outperform the methods 
with the Gaussian basis (GAG) by a large scale: they are at the same time faster and more 
accurate. In this case with average precision GAH with at least 24 basis functions provides fast 
and accurate results. In a setup with lower observation noise, AGAH shows a better performance 
than GAH, compare Figure l5| 



^More precisely, we consider 

^ m L ^ m L 

EDM:= ^^^(X^ (t,)-l^(t,))' and EDV := ^ ^ ^(1/^ (t,) - (^,))^ 

j — l i— 1 J — 1 i~l 

where (ti) is the filtering estimate at time ti in the j'-th simulation and X-'{ti) is the result provided by a particle 
filter with 10* particles, which is close to the explicit solution of the problem. Similarly, V"' (ti) is the conditional 
variance obtained by filtering at time ti in the j-th simulation and V-' (ti) is the conditional variance obtained by 
branching particle filter with 10* particles. The number of simulations is to = 100. 
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A Additional Proofs 



Proof of Lemma \3.S\ Our aim is to show that for aU n G N it holds that 

^Si\\B\\^ + \\Cf)^ 
(n!) 2^ 



< 



(41) 



The operator L was defined in ( 24 1 . We rewrite as 

rt 



Jo 



with the (/ + l)-dimcnsional martingale AI := {Z^ , Y)^ and E{t, s) :— St-s{B^ , C)^. Iterative 
application of L gives that 

To compute |L"^|t = sup^^jQ-p] (E°(||(_L"^)(i)|||j)) note that the quadratic variation of M 
is < M >t— Ii+it where is the identity matrix on The Ito-isometry therefore yields 

E°(||(i"^)WllH) 



< 5*^(115112 4 

< 52"(||B||2 



E{tM)(^j^' E{hM){- 



t„-l 



\C\\ 



s 



2n/ 



\B\\ 



T 



■1 


T 

^0 

m 



Eih,t2){ 

t„-i 



E(tn-l,tn)^tJMt 

•tn-l 



dtr, 



dMt, 

E{tn-l,tn)it„dMt„ 

,dti 



dti 
dMt 



H 



dti 



n 



and we obtain (41). 



□ 



Proof of Proposition 4-3 As remarked after Theorem 3.1 in (21 1 the claim is proved if we can 
show that U„y„ is dense mV = H^{R'^). Here K = span{ei, . . . , e„} and V = H^iW^). Let 
be the set of smooth functions with compact support. Then, by Proposition 1 and Theorem 
4 in Bongioanni and Torrea (2006) there exist for all u g Cq° a sequence u„ € U„14 such that 

□ 



\U - Un\\v 



as n — >■ c». Since C§° is dense in V the claim follows. 



Next we turn to the determination of the coefficient matrices from Example 4.4 Our starting 
point are well-known recursive reslationships for Hermite polynomials defined in ( 33 ) : it holds 
that /■ = xfi - f,+i and {f,)' = ifi-i. This gives xfi{x) = ifi-i{x) + fi+i{x). These relations 
can be used to derive a number of useful relationships for the Hermite basis functions: 



Lemma A.l. For the Hermite basis (ei)i>i defined in (34 1 it holds that 

XBiix) = \Ji-\ ej_i(a;) + \Jie^^\{x) 

x^ei{x) = \/{i- l){i - 2) ei^2{x) + {2i - l)ei(x) + V + 1) £1+2(2^) 
1 



{e^y 
X {ei{x)y 



1 e,;_i 



Viej+i) 



^(v/(i-l)(i-2)e,_2 - e, ~ ^/i{i + 1) e,+2) . 
T(v/(i-l)(»-2)e,_2 + (1 - 2i) e, + ^/i{i + 1) ei+2) . 
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Proof. Regarding the first result, note that 



= Vi - lei_i(x) + \/iei+i(a;). 

Using this expression twice on x'^ei{x) = x{xei{x)), we obtain the second result. For the 
remaining two expressions we compute the derivative of and use the recursive expression for 
/j' given above. We obtain 



= ^xei{x) - Viei+i{x) = ^(Vi - lei_i(x) - Viei+i(a;)) 



and 



^{Vi^e',-i - Vie',+i) - ^ (v/(i - l)(i - 2) e,_2 + (1 - 2^) e, + + 1) e.+2). 



We conclude by noting that 

X {ei{x)y = xci^i ~ Vixci+i) 



□ 



Proof of Lemma 4-5 We start by observing that 



2 

= ^(V(j-l)(i-2)e,-2 - e, - Vj(j + l)e,+2) 

2 

+ y(v/0'-l)(j-2) + (1 - 2j) e^- + v/.?^ + 1) 6^+2) • 

The expression for (ei,^ej) now follows by orthonormality of the Hermite basis. In a similar 
way 

hxej{x) = h{^/j - lej-i{x) + j ej+i(x)), 
and the second expression follows. Finally, note that 

(Ax^ - \)e^{x) = A(V(i-l)(j- 2) 6^-2(2;) + (2j - l)e,(x) + /?(JTl) e,+2(a;)) - e,(x) 
and we conclude. □ 
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Proof of Lemma \4-S[ First, note that, 



(l,e,+i) = (27r) le ^ f.,{x)dx 

_ \/2(27r)-i 

_ \/2(27r)-i 



e ''■^ fi{V2x)dx 
^ ^ fro ' 

i k 

^^z?^2n^^ / ^e-^/,(x)/o(x)da 



V2(27r)4 

V k=0 j=o 



since any power of x can be represented as linear combination of the Hermite polynomials. 
The last step follows by orthonormality of the Hermite basis. An analogous argument with the 
constant function 1 replaced by x^ gives the result. □ 



Proof of Lemma \4.8\ The main difficulty is the noncentrality of go- Our main tool are the 
representations fj{x) — J2k=o'^i^'' ^^'^ ^* ~ ^\=o '-kfki^)- For arbitrary a and b, a non- 
central Hermite polynomial has a representation in terms of central Hermite polynomials as 
follows: 



fj{a + bx) = ^^l{a + bxf 

k=0 

L — n — n V / 



Using the representation x"^ = X^^Lo fr{^) leads to 

j k rn 



fe=0ni=0r=0 ^ ^ 



E(EE^i-Ur'=-^'"c)/.(:^) 

^Ar{j,a,b)fr(x). 



r=0 

To obtain our result, we start from 



(00161)= / — e ^"0 — - — r — T-j-e -1 fi-i(x)dx. 

Observe that 



{x - fioY 2_ _ - a) 
20-2 4 ~ 262 
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with a, b and d as specified in the Lemma. Furthermore, 

— ^ y e fj_i{x)dx = J ^^^ e fj-i{bx + a)dx 



= A(j - l,a,^) / <l^{x) fr{x)fo{x)dx 

r=0 

by orthogonality of the Hermite polynomials. Summarizing, we obtain that 



□ 
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